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Summary — The sequence of the 3’-most 8300 nucleotides of the gnome RNA of the Purdue-115 strain 
of the transmissible gastroenteritis virus TGEV, a porcine coronavirus, was determined from cDNA clones. 
The available sequence corresponds to the part of the genome (total length>20 kb) expressed through 
subgenomic mRNAs. The 5 subgenomic and the genomic RNA species detected in TGEV-infected cells 
form a 3’-coterminal ‘nested’ structure, a unique feature of Coronaviridae. 

The transcription initiation site of the TGEV subgenomic RNAs appears to involve the hexameric se- 
quence 5'CTAAAC, which is present upstream from each coding region. In addition to the previously iden- 
tified genes encoding the three structural proteins, E2, El] and N, two regions, X1 and X2, corresponding 
to the non-overlapping portion of mRNAs 4 and 3, may code for so far unidentified non-structural polypep- 
tides. The predicted X1 polypeptide (9.2 kDa) is highly hydrophobic. The sequence of the X2 region allows 
the translation of two non-overlapping products, i.e., X2a (7.7 kDa) and X2b (18.8 kDa). No RNA species 
liable to express the extreme 3’ open reading frame X3 was found. 


coronavirus / transmissible gastroenteritis / TGEV / messenger RNAs / genome structure / gene sequence / non-structural 
polypeptides-1987) 


Résumé — Virus de la gastro-entérite transmissible (TGEV): séquence partielle, organisation et 
expression ce l’ARN génomique. La séquence des 8300 nucléotides en région 3’ de l’ARN génomique 
du coronavirus porcin TGEV (souche Purdue-115) a été établie a partir de clones d’ADNc. Par rapport 
au génome entier (>20 kb), cela recouvre l’ensemble des séquences exprimées par l’intermédiaire d "ARNs 
messagers de taille subgénomique. Les 5 espéces d’ARN subgénomiques et l’ARN génomique détectés dans 
les cellules infectées forment des séquences emboitées co-terminales en 3', ce qui est caractéristique du mode 
de réplication des Coronaviridae. Une séguence hexameérique, 5‘'CTAAAC, présente juste en amont de chaque 
région codante, constituerait le site d’initiation de la transcription des ARN subgénomiques du T' GEV. Outre 
les genes des 3 protéines structurales E2, El et N précédemment identifiés, deux régions X1 et X2, corres- 
pondant a la région « unique» des ARNm 4 et 3, pourraient coder pour des polypeptides non-structuraux, 
actuellement non-identifiés. L’un des polypeptides prédits, X1 (9.2 kDa) est extrémement hydrophobe. 


* Author to whom correspondence should be sent. : ; oes . ; 
Abreviations: bp: base pair ; IBV: infectious bronchitis virus ; kb: kilobase; MHV: murine hepatitis virus; ORF: open reading frame; 
SSC: saline sodium citrate; TGEV: transmissible gastroenteritis virus. 
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Deux produits complétement distincts, X2a (7.7 kDa)) et X2b (18.8 kDa), pourraient étre traduits a partir 
du mRNA 3. Aucun ARN susceptible d’exprimer la phase codante située a l’extrémité 3' (X3) n’a été mis 


en évidence. 


coronavirus | gastro-entérite transmissible | TGEV | ARN messagers | structure du génome | séquence des génes | poly- 


peptides non-structuraux 


Introduction 


Transmissible gastroenteritis virus (TGEV), an im- 
portant pathogen of swine neonates, belongs to the 
Coronaviridae, a family of enveloped viruses with 
a large, positive-stranded RNA as their genome [1]. 
Earlier studies showed that the TGEV genome con- 
sists of a unique RNA molecule, approximately 
20 kb in length, which is polyadenylated and infec- 
tious similar to that of other coronaviruses [2]. 
Although the total number of genes encoded has 
not yet been determined, the TGEV genome codes 
for at least four polypeptides on the basis of ex- 
isting protein and nucleotide data. The virions are 
constructed of three polypeptides, the nucleocap- 
sid (N), the membrane (El) and the spike or 
peplomer (E2) polypeptides, the complete sequence 
of each of which has been recently established 
[3 — 5]. These three genes account for approximate- 
ly 6.3 kb of coding information. In addition, at 
least one non-structural polypeptide is synthesized 
during virus replication, an RNA dependent-RIVA 
polymerase, which requires Mg** cations and is 
probably membrane-bound [11]. 

Expression of the coronavirus-encoded informa- 
tion proceeds through the synthesis of several 
distinct mRNA species of subgenomic size. The 
transcription strategy has been studied in detail on 
the murine hepatitis virus (MHV) and infectious 
bronchitis virus (IBV) models. The intracellular 
RNA species (7 and 6 in number, respectively, in- 
cluding the genome RNA) have been shown to form 
a nested set with common 3’ ends. The translated 
sequences correspond approximately to the 5’ por- 
tion which is absent in the next smaller RNA. The 
subgenomic RNAs contain leader and body se- 
quences joined through a discontinuous transcrip- 
tion. This process relies upon the presence of a short 
homologous sequence in each intergenic region, 
most likely acting as a recognition signal for the 
polymerase — leader complex [6— 10]. Less infor- 
mation is available concerning TGEV transcription. 
The number of subgenomic RNA species synthesiz- 
ed in infected cells varies from 4 to 9 in previous 
literature [11 — 14]. 


The purpose of this paper is, first, to propose a 
model of TGEV genome organization and expres- 
sion based on both sequence analysis of cloned 
virion RNA and characterization of virus specific 
intracellular RNAs, and second, to describe the 
characteristics of additional polypeptides possibly 
encoded by the genome. 


Materials and methods 


Virus and cells 
The Purdue-115 strain of TGEV was propagated in the 
PD5S-cell line and virions were purified as reported [15]. 


RNA extraction 

Purified virions were treated with proteinase K (200 
units/ml; Merck) and 2% SDS for 30 min at 37°C. RNA 
was extracted once with phenol and twice by 
phenol—chloroform (1/1) with gentle agitation. After 
ethanol precipitation with sodium acetate (0.3 M), the 
RNA pellet was resuspended in sterile bidistilled water 
and stored at — 80°C. The extraction yield was 40 — 50 ug 
of RNA for i mg of purified virion. 


CDNA synthesis 

The purified RNA was denatured by methylmercuric 
hydroxide for 10 min at room temperature [16]. The final 
concentration of CH,HgOH in the reverse transcription 
reaction mix was optimized to 8 mM. The reaction was 
carried out at 37°C for 2 h in 50 ul containing: 15 ug 
of extemporaneously denatured RNA, RNasin (100 units; 
Promega Biotec, Madison), KCl (40 mM), MgCl, (6 mM), 
Tris-HCI (40 mM, pH 8.3, at 37°C), 2-mercaptoethanol 
(56 mM;; i.e. 7-fold molar excess to CH,HgOH), dATP, 
dCTP, dGTP, dTTP (0.5 mM each), PH]GTTP (100 
uCi, 30 Ci/mmol; Amersham); primers pdT 12-18 
(Pharmacia) or pE2 (sequence specific, 30-mer [5]) (5 ug) 
and ‘super’ reverse transcriptase (88 units; Stehelin, 
Basel). The reaction was stopped with EDTA (20 mM) 
followed by phenol—chloroform extraction. The 
RNA — cDNA hybrids were precipitated with ethanol and 
2 M ammonium acetate [17]. About 4 ug of cDNA were 
obtained from 15 yg of RNA. 


RNase T2 treatment 
The RNA~—cDNA hybrid material was subjected to 
RNase T2 treatment in a volume of 50 ul containing NaCl 
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(250 mM), sodium acetate (10 mM, pH 4.5) and RNase 
T2 (17 units; BRL) (S. Van der Werf, Institut Pasteur, 
personal communication). After a 15 min incubation 
at 37°C the material was extracted with phenol — chlo- 
roform, desalted in a centrifuged Sephadex G — 50 col- 
umn and ethanol precipitated using 2 M ammonium 
acetate. 


Tailing and cloning of cDNA 

Homopolymeric dC tails were added to RNA—cDNA 
hybrids (S00 ng) by incubation (3 min at 37°C) in a 
20 wl reaction mixture containing potassium cacodylate 
(100 mM), Tris-base (25 mM, pH 7.6), CaCl, (1 mM), 
DTT (0.2 mM), dCTP (0.2 mM), BSA (0.5 mg/ml; BRL) 
and terminal deoxynucleotidyl transferase (675 units/ml; 
Pharmacia P.L.). dC-tailed RNA — cDNA hybrids were 
annealed to PstI-cut dG-tailed pBR322 (BRL; 1.5 mg/l, 
i.e., 2-fold molar excess to RNA — cDNA hybrids) under 
the following conditions: 20 mM Tris-HCl, pH 7.4; 
300 mM NaCl; 1 mM EDTA; at 62°C for 15 min; at 
57°C for 2 h then cooled to room temperature. The mix- 
ture was used to transform competent E. coli RR1 [18] 
which were plated onto L-agar containing 12 mg/ml of 
tetracyline. The percentage of ampicillin-sensitive trans- 
formants ranged between 60 and 90% in the different 
experiments. 


Screening and mapping 

The clones containing an insert exceeding 800 bp were 
selected [19]. A map of cloned inserts was achieved by 
means of Northern and Southern blot hybridizations and 
hexanucleotide restriction enzyme analyses [20]. For Nor- 
thern blot experiments, total RNA of TGEV-infected 
PDS celis was exiracted by ihe guanidium isothiocyanate 
technique [21] and deposited on a 0.75% denaturing 
agarose gel containing formaldehyde. RNA transferred 
onto nitrocellulose was hybridized with nick-translated 
(P]dCTP labeled plasmids [20]. Filters were washed in 
0.1x SSC + 0.1% SDS at 55°C for 1 h. In Southern blot 
experiments, identical hybridization and washing condi- 
tions were employed. 


DNA sequencing 
Sonicated plasmid fragments ranging from 500 to 700 bp 
were subcloned into Smal-cut M13mp18 phage vector 
[22]. The DNA sequence was determined with the chain 
termination method [23] using the 17-mer sequencing 
primer and [>S]dATP (600 Ci/mmol; NEN) as the 
label. The sequence was determined on polyacrylamide 
buffer gradient gels [24]. The whole sequence was 
determined on both strands. Sequencing data were 
analyzed using the Microgenie sequencing program 
(March 1985 version, Beckman). The supercoiled plasmid 
dideoxy-sequencing method [25] was occasionally 
employed to confirm partial sequence data, using oligo- 
nucleotide primers synthesized on a Biosearch 8600 ap- 
paratus. . 


Results 


Generation and mapping of cDNA library 


RNA extracted from purified TGEV consisting of 
a large-sized (>20 kb), homogeneous, potentially 
full-length material, was reverse transcribed after 
oligodT-priming. Several discrete cDNA species, 
most likely due to the existence of stable secondary 
structures in genome RNA, were produced (Fig. 1); 
a well-defined band of approximately 18 kb, ex- 
pected to encompass the major structural protein 
genes, was visible. This material served to generate 
the pTG2 library. Six recombinant clones (2.15, 
2.21, 2.26, 2.27, 2.40, 2.50) were oriented along the 
genome by means of Northern hybridization with 
size-fractionated RNAs from TGEV-infected cells 
(Fig. 2). Clone pTG2.21 (and 2.15, data not shown) 
contained sequences hybridizing with 6 RNA 
species, of which the largest one (RNA 1) had the 
same size as that of virion RNA. Clone pTG2.50 
hybridized with all species except RNA 6. Clone 
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Fig. 1. Electropk sresis of CDNA synthesis products. *H-labeled 
cDNA materia; from two different experiments was analyzed 
in denaturing 0.75% alkaline agarose gel. The estimated size of 
the major discrete species is given in kilobases. 
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pTG2.26 had common sequences exclusively with 
RNA 1 and 2, whereas clones pTG2.40 (and 2.27, 
data not shown) possessed sequences only present 
in RNA 1. This result is consistent with the fact that 
in coronaviruses, genome RNA and subgenomic 
RNA species form a nested set with 3’ common se- 
quences. Additional clones were probed against 
clones 2.26 and 2.15, using Southern blotting. All 
the selected clones were mapped by restriction en- 
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Fig. 2. Northern blot analysis of TGEV intracellular RNAs. Total 
RNA from TGEV-infected PDS cells was resolved in for- 
maldehyde 0.75% agarose gel, transferred onto a nitrocellulose 
filter, then hybridized with 4 different 32P-labeled plasmids 
(designated at the left). An autoradiograph of each blot is shown. 
Migration was from left to right. The mRNA species detected 
are numbered from 1 to 6. 
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zyme analysis. The overlapping clones were shown 
to stretch along the 7 kb DNA (Fig. 3). Clones 2.21, 
2.15 and 2.26 were sequenced. Subsequently, a se- 
cond library (pTG6) was produced using a synthetic 
primer pE2 located 3.8 kb from the 3’ end [5]. 
Resulting overlapping clones were found to extend 
the continuum up to 14500 bases (Fig. 3) of which 
8300 bases starting from the 3’ end have been se- 
quenced. 


Nucleotide sequence analysis 


Seven major open reading frames (ORFs) were 
identified by stop codon analysis (Fig. 4). As 
previously reported, the 3 largest ones encode the 
major structural proteins, E2, El and N. In addi- 
tion, 4 ORFs exceeding 200 bases, designated X2a, 
X2b, X1 and X3, were detected. The sequence seg- 
ment extending from the 3’ end of the E2 gene up 
to the 3’ end of the genome (3920 nucleotides) is 
displayed in Fig. 5 along with the translation of the 
main ORFs. During the course of this work, se- 
quences of the El and N genes and downstream se- 
quences became available from another group 
[3, 26]. As seen in Fig. 5, there were only few dif- 
ferences between the two sets of data. The stretch 
of 111 nucleotides up to the poly(A) is lacking from 
our data. 
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Fig. 3. Restriction endonuclease map of part of TGEV genome (14.5 kb). The length and distribution of cDNA clones selected from 
the PTG2 and pTG6 libraries are shown. The clones used for sequencing are marked by a solid circle. Open circles indicate clones 
which have been partially sequenced using plasmid dsDNA as a matrix. Bottom: The 5 subgenomic RNA species identified by Nor- 
thern hybridization are positioned along the genome map. Restriction enzyme sites: * : Bg/ll; 0: EcoRI; W: Hindlll; %: Hpal; 


O: Pstl; @: Pvull; mw: Xbal; W: Xhol. 
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Fig. 4. Stop codon analysis of the virus sense RNA. A computer graphical output of the open reading frames within the first 8300 
nucleotides from the 3’ end is shown. Stop codons are represented by vertical bars. Bars with an open triangle indicate proximal 
ATGs in the corresponding frame. Arrowheads beneath the lower frame mark the position of every 5’CT. AAAC hexamer found in 


the sequence. 


A remarkable feature of the sequence was the 
presence of an identical hexamer 5’CTAAAC 
upstream from the E2, X2a, X1, El, N and X3 
ORFs (Figs. 4 and 5). As suggested for MHV and 
IBV (see introduction), these homologous se- 
quences are likely to act as initiation sites for the 
transcription of each mRNA species. According- 
ly, it was postulated that the CTAAAC located im- 
mediately upstream from the ORFs X2a and X1 
ORFs should correspond to the start of the mRNAs 
3 and 4, respectively (see Discussion). The non- 
overlapping region of mRNA 4 appeared to con- 
tain a single ORF, X1* (246 bases). The predicted 
sequence of mRNA 3 might allow translation of 
two ORFs: X2a, 213 bases long and starting 24 
bases downstream from the CTAAAC sequence; 
and X2b, 495 bases long and starting 570 bases 
downstream. Three more points were noted regard- 
ing X2b: 1) no stop codon occurred up to 267 
nucleotides upstream from the potential initiation 
codon (position 715, Fig. 5); 2) with its 3’ end par- 
tially overlapping the X1 ORF, X2b is the sole ORF 
to stretch into the ‘unique’ sequence of the adja- 
cent smaller RNA; 3) the sequence of the whole 
X2 region was established on 4 independent clones 
(see Fig. 3). Surprisingly, 2 of them (pTG2.15 and 


2.33) lacked the same 13 base sequence (discon- 
tinuous box near position 1000 in Fig. 5); this 
created an alternative ORF, X2b’, only 294 bases 
long and ending at position 1019 (stop codon 
overlined). 


Discussion 


Organization and expression of TGEV genome 


About 14500 nucleotides of the 3’ end region of 
TGEV genome were cloned in the pBR322 vector 
and mapped. All clones used in our study have 
been derived by direct cloning of a RNA- DNA 
heteroduplex. According to the size (up to 5 kb) and 
distribution of the copy fragments, this simple 
method appeared to be as efficient as that of Gubler 
& Hoffman (dsDNA synthesis using RNase H) in 
the case of IBV RNA cloning [27]. Moreover, 
although we used oligodT, instead of random- 
priming, clones mapped at more than 14 kb from 
the 3’ end. 

The sequence part, 8300 nucleotides at the 3’ 
region, spanned that complete portion of TGEV 


*The X1 ORF was observed to contain a 15 bases out-of-frame sequence 5’ ATTATATTGATATTA identical to an in-frame se- 
quence found near the 3’ end of the E2 gene ([5]; not shown). 
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40 80 120 
GACAATTTBAAAATTACGAACCAATTSAAAAAGTECACETCCATTAAATTTAAAATETTAATICTATCATCTGCTATAATAGCAGTIGTTICTECTAGAGAATTTIGTIAASGATGATGA 
QO FENYEP!TEK VAY H *{Ezend] 

160 200 240 

ATAAAGTCTTTAAGAAL TAAACTTACGAGTCAT TACASSTCCTGTATEGACATTGTCAAATCCATTTACACATCCSTABATECTETACTIGACBAACTIGATIGIGCATACITTGLIGI« 
HDIVKSTYTSVDAVLDELDCAYFAY 

280 320 360 
ACTCTTAAAGTAGAATTTAAGACTEGTAAATTACT TETTGTATAGGTTTIGBTGACACACTICT TGC TECTARGGATAAAGCATATECTAAGCTTEGTCTCTCCATTATTGAAGAAGTC 


TEKVEFKTGEKLLVOCT6BFEDTLLAAKDKAYAKLELSIIEEYV 
400 440 480 


AATAGTCATATAGTTGTTTAATATCATTARACACACAAAACCCARAGCAT TAAGTETTACARAACAAT TARAGAGAGAT TATAGARAAACTGTCAT ICTAAAT ICCATGCGAAAATTATT 
NSHIVY SSS 


520 560 600 
GETGGACTTTTICTTAGTACTCTGAGTTTTGTAAT TET TAGs AACCATICTATTGTTAATAACACAGCAAATETGCATCATATACAACAAGAACETGTIATAGTACAACAGCATCAGGTT 
rN EE 

640 680 720 
GTTABTECTAGAACACAAAACTATTACCCAGAGT ICABCATCECTETACTCTTTGTATCTITTCTAGCTT TETACCETAGTACAAACTT TAAGACBTGTETCEGCATCTTAATGTITAAG 

| X2p Fem F OK 

780 800 B40 


ATTTTATCAATGACACTTTTAGGACCTATECTTATAGCATATEGTTACTACATTGATGGCATIGTTACAACAACTGICTTATCTITARGATTTGTCTACTTAGCATACTITIGGTATET! 
TLSMTLEGEPHLIAVYVGEYVYIDEIVITIVESLRFVYLAVYEFWYY 
880 920 960 
AATAGTAGETTTGAATTTATTTTATACAATACAACEACACTCATGTT TETACATGECAGAGCTECACCET TTATGAGAAGTICTCACAGCICTATITATGTCACATIGTATGGTEGCATA 
NSRFEFILYRTTILNXNFVHGRAAPFHRSSHSSTYVTLY66E1 


1120 1140 1206 
Bae er FB ge eee pe RMeTCOBTETTTACRATECABCCTTTTCTCABELEG IETAARESAAATTEACTTAAAAGAAGAAGRABRAGACCATACCTATARC 
GOFIYVFS@EPVVEVYNAARFSOAY NEUCDULKEEEE D Y 0D 


1240 1280 326 
GITTCCTAGEGCATTEACTETCATAGATGACAATEGAATEGTCATTAACATCATTTICTGBTICCTSTIGATAATTATATTGATATTACTTTCAATAGCATTECTAAATATAATTAAGCT 
FRPRALTVIDONGHRVINTIFWFLLITILILLELSTALLNIIKL 
1360 1400 1440 
ATECATEGTETGTTECAATTTAGGARGGACAGTTATTATIGTTCCABCECAACATECT TACBATECCTATAAGAATTITATSCGRATTAAAGCATACAACCCCGATGGAGCACTCCTIGC 
CHV CCNLERTVIITVPAGDHAYDAYKNFENRIKAYNPDGEALLA 
1480 1520 1560 
TroME TARA TeanGHTTTTTAATATTAGLETSTOTGATSATECEOATSAGAGAACECTATTETGCATGAAATECGATACABATTTCATETCSCARTASTACASCST 
PEspe KILL ILACVIACACEERYCANKSDTDLSCRNSTA 
1600 1640 1480 
CTGATTETEAGTCATECTTCAACEGAGECGATCT TATTTEGCATCTTGCAAACTEGAACT ICAGCTEBTCTATAATATTGATCOTTTITATAACTETECTACAATATEGAAGACCTCAAT 
SDCESCFNEBEDLIWHLANWNFSWSTILIVFEFITVLOYGRPQ 
c 1720 1760 1800 
TCAGCTESTICETETATGBCATTAAAATECTTATAATGTGECTATTATSGCCCGTIGTITIGECICTTACGATTTTTAATECATACTCGSAATACCAAGTETCCAGATATGTAATETICE 
FSWFVYGTKNLIMWLLWPYVYLALTIFNAYSEYOVSRYVUHE 
A 1840 1880 1920 
GCTTTAGTATTECAGGTECAATTETTACATTTETACTCTEGATTATETATITIGTAAGATCCATICAGTTETACAGAAGGACTAACTCTTGGTGGTCTITCAACCCTGAAACTAAAGCAA 
FSTAGBAIVTFVEWIMNYFVRSTOLYRRINSWWSENPETKA 
1960 2000 ik P 
TICTTTECETTAGTECATTABGAAGAAGCTATGTECTICCTCTCGAAGGTGTECCAACTGBTETCACTCTAACTTTGCTITCAGSGAATTTGTACGCTGAAGEGTICARAATTSCAGATE 
TLCVSALERSYVLPLEGYPTEVILTITLLS&ONLYAEGEKIAD 
2080 2120 6 
BTATGAACATCEACAATTTACCARAATACETAATGBTTECATTACCTAGCAGGACTATIGTCTACACACTIGTTESCAAGAAGTTGAAAGCARGTAGTECEACTEGATGGGCTTACTATE 
BAN TONLPKYVHVALPSRTITIVYTLVEKKLKASSATEWAYY 
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2200 2740 2280 
Foot ee ETON STAN 81 CATERER AT 0H ETO Teepe AEE 
VK SKABDYSTEARTDNLSE@EKLLHAY AO K AN OG ORV 
2320 2360 2400 
GTTEEGEABATGAATCTACCAAAACACETGGTCGTICCAATTCCCSTGBTCEGAAGAATAATAACATACCTCTTTCATTCTICAACCCCATAACCCTCCAACAAGGTTCARAATTITGGA 
SHE DESTKTRERSNSRGRKNNNIPLSFFNPITL@OOGSKEU 
2440 2480 2520 
ACTTATSTCCBAGAGACTTTETACCCAAAGGAATAGGTARCAGSGATCAACAGAT TBGTTATTGGAATAGACAAACTCGC TATCECATESTGAAGEGCCAACETAAAGAGCTTCCTGARA 
HLOPRDEVPKBIGNRDOOTGYWNROTRYRNHVKGBORKELPE 
2560 2600 2640 
GETEGTTCTICTACTACTTAGGTACTEGACCTCATSCAGATECCAAATTTAAAGATAAATTAGATAGAGTTGTCTGGGTTECCAABGATESTECCATGAACAAACCAACCACECTTEGTA 
RWEFYYLGOTGOPHADAKFKDKLDGVVWVAKDGANNKPTTLE 
2680 2720 2760 
GTCGTEGTECTAATAATGAATCCAAASCTTTGAAATTCGATESTAAABTECCAGGCBAATTTCAACTIGAAGTTAATCAATCARGAGACAAT ICAAGETCACGCTCTCAATCTAGATCTC 
SRGANNESKALKFEFDGKVPGEFQGLEVNOSRONSRSRSOSRS 
2840 2880 
GETCTAGRAATAGATCTCAATCTAGAGBCAGECAACAATTCAATAACAAGRAGGATGACAGTETASAACAAGCTGTICTTECCECACTTARAAAGTTAGETGTTGACACAGAAAAACAAC 
RSRNRSAOSRERQVOFNNKKDDSVERAVLAALKKLGEVDTEK A 
2920 2960 og 3000 
ABCAACBCTCTCBTTCTAAATCTAAAGAACGTAGTAACTCTAAGACAAGABATACTACACCTARGAATGAAAACAAACACACCTEGAAGAGAACTECAGETAAAGETGATETEACAAGAT 
CO RSRSKSKERSNSKTRDOTTPKNENKHTWKRTAGKEDVTR 
3040 3on0 5 3120 
TTTATEGASCTAGAABCAGTTCABCCAATTTTGSTGACACTGACCTCETTECCAATGGGAGCABTECCAAGCATTACCCACAACTEGC TGAATGTSTTCCATCTGTGTCTAGCATICTET 
FYGARSSSANFEDTDLVANGSSAKHYPQLAECVPSVSSIL 
3160 3200 3240 
TTBGAAGCTATTGGACTTCARAGGAAGATEGCBACCAGATAGAAGTCACBTTCACACACARATACCACTTECCAAAGGATGATCC TAAGACTGGACAATICCTTCABCAGATTAATECCT 
FESYWTSKEDGBDOTEVTFTHKYRLPKDDPKTG@FLOOTINA 
3280 3320 3340 
ATBCTCBTCCATCAGAABTEGCAAAAGAACAGAGARAAAGAAAATCTCETICTAAATCTECAGAAAGSTCAGAGCARGATGTGETACCTEATGCATTAATAGAAAATTATACAGATGTGT 
YARPSEVAKEQRKRKSRSKSAERSEGDVVPDALTEWYTODY 
A 3400 3440 3480 
TTGATEACACACAGSTTGAGAATATTGATGAGETAACGARLTAAACBAGATECTCBTCTICCTCCATECTETATITATIACABTTTTAATCTIACTACTAATTEGTAGACTCCAATIATT 
Cee Ee ee LAA VET TCC. eek Leo! 
gn 3520 3560 3600 
ABAAAGACTATTACTTAATCACTCTTTCARTCTTAAAACTGTCAATGACTTTAATATCTTATATABGAGTTTABCAGAAACCAGATTAC TAAAAGTGGTECTICGAGTAATCTITCTABT 
ERLLLNHSFNLKTVNDFNILYRSLAETRELKVVLRVIFLY 
D 3440 1, 3680 3720 
CTTACTASGATTTTGCTECTACAGATTETTAGTCACATTAATETAABECAACCCGATGTCTAAAACTGGTITILCGAGGAATTACTEGTCATCGCECTETCTACTCTIGTACAGAATGGT 
LLBEFCCYRLLVTLA 
370 3800 * 3840 


ARGCACETETAATAGGAGGTACAAGCARCCCTATTECATATTAGGAAGTTTAGATTTGAATTTGBCAATECTAGATT TAG TAATT TAGAGAAGT 1 TAAAGATCCECTACBACGAGLCARC 


3880 3920 
AATBGAAGAGCTAACBTCTGGATCTAGTGATIGTTTAAAATGTAAAATIGTITGAAAATITICCTITIGATAGTEATACARAAAA 


Fig. 5. Sequence of the 3’-most 3920 nucleotides of TGEV genome. The open reading frames are translated in one letter amino acid 
code. The homologous sequences CTAAAC are boxed. The line upstream from X2b ORF indicates a frame without stop codon. 
A glycosylation signal present in the X2b product is underlined. Nucleotide and amino acid differences with another published se- 
quence (from position 1400-3820) are indicated. The 111 base long sequence from the star to the poly(A) is taken from (3). 
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RNA expressed through subgenomic size RNAs, 
whereas the portion left unsequenced presumably 
encodes the polymerase. As shown in Fig. 4, the 
region sequenced comprises the 3 genes encoding 
the major structural proteins N, E1 and E2, already 
identified on the basis of their predicted translation 
products [3 — 5]. Additionally, three regions, X2a, 
X2b and X1, might code for non-structural or, less 
probably, minor structural polypeptides so far 
unidentified. 

As a striking feature, each coding region (except 
X2b) was preceded by a short consensus sequence 
5'MCTAAAG, similar to those observed in the 
genome of MHV (AATCZAAAC, [9]) and IBV 
(CTEAACAA, [8]). Thus, we believe that these 
homologous sequences correspond to the site of 
translation initiation in the TGEV genome. This 
assumption is strengthened by the finding that the 
measured size of the non-overlapping region of each 
intracellular RNA species was in accordance with 
their respective predicted size (data summarized in 
Table I). It is worth mentioning that the sequence 
CTAAAC was never present internally ina TGEV 
ORF, except in one case, about 150 bases after the 
start of the E2 gene (Table I; [5]). The CTAAAC 
sequence located upstream from the X3 ORF, for 
which no corresponding intracellular RNA species 
was identified (see below), might also be non- 
functional for mRNA transcription. If confirmed, 
this would suggest that additional factors govern 
the reinitiation of the RNA polymerase — leader 
complex. 

Our results demonstrate that TGEV intracellular 


RNAs form a 3’ co-terminal ‘nested’ set (Fig. 2), 
a feature of Coronaviridae. In addition, the RNA 
species pattern is in agreement with that recently 
published by others [14]. Typically, RNA 5 en- 
coding E1 (2.5 kb) and less abundant RNA 4 (3 kb) 
appear to be close to each other in size, unlike what 
was reported by another group [13]. An additional 
poly(At)RNA species, 0.7 kb long and rather 
rare, could have been a candidate for the extreme 
3’ ORF called X3. However, it was not detected by 
Northern hybridization using a cDNA probe [14]. 
A similar result was obtained in our experiments 
in which total intracellular RNA was analyzed. 
The overall view of our data led us to propose 
the model of the structure of TGEV genome 
depicted in Fig. 6. Its organization appears to be 
‘intermediate’ between those of MHV and IBV. 
Like IBV, TGEV possesses 5 subgenomic mRNAs 
and lacks a subgenomic RNA species larger than 
the E2 encoding RNA 3, which exists in MHV. On 
the other hand, the El and N genes are adjacent 
in both MHV and TGEV genomes. The coding 
regions of TGEV genome are densely packed 
overall, yet there are almost no overlaps. The in- 
tergenic regions consist of 0— 15 bases, except the 
E2 — X2a junction, which is 120 bases tong (Fig. 5). 
Every subgenomic RNA species appears to be func- 
tionally monocistronic, except RNA 3, which 
potentially allows the translation of two non- 
overlapping products, X2a and X2b. It is notewor- 
thy that MHV RNA 5 and IBV RNA D also possess 
a sequence arrangement which might imply an in- 
ternal initiation of protein synthesis [28, 29]. This 


Table. I. Comparison between the nucleotide position of the homologous regions and ihe calculated size of the non- 
overlapping regions of each subgenomic RNA. 


TGEV homologous regions Base distance Adjacent 


Predicted size? of the body sequence RNA species 
from the 3’ end ORF Ene TR any tare an Cee as ee 


Nucleotide data Experimental data‘ 


8300 E2 4.5 4.5 2 
8150 - 4.4 _ Not detected 
3780 X2a | 1.1 3 
2760 Xi 0.3 0.4 4 
2470 El 0.8 0.7 5 
1670 N 1.2 1.8 6 
510 X3 0.5 — Not detected 


4 In kilobases. 
‘ Distance between the two closest homologous sequons. 
Difference of size between an RNA species and the next smaller one as measured in a denaturing gel. 
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Fig. 6. Compared organization of the genome of three cor- 
onaviruses: porcine TGEV, murine MHV and avian IBV. An 
encircled number or letter placed on the left of a sequence seg- 
ment indicates the encoding RNA species. The genes coding for 
the three major structural proteins (peplomer E2, membrane E1 
and nucleocapsid N) are represented by hatched boxes. The 
diagrams of MHV and IBV genomes have been constructed using 
data from [1, 28, 29]. 


shared feature might be of biological significance, 
as for instance a deliberate limitation of the syn- 
thesis of the product encoded by the downstream 
ORF. 


Potential primary translation products of 
mRNAs 4 and 3 


The X1 ORF, encoded by the 5’ sequences of 
mRNA 4, potentially directs the synthesis of an 82 
amino acid long polypeptide of 9241 Da, which 
appears to be extremely hydrophobic (Fig. 7). 
Its composition is very unusual with 32% 
leucine + isoleucine residues. The codon usage of 
X1 does not differ from that of the structural pro- 
tein genes (data not shown). In particular, codon 
ATC is unfrequently used for isoleucine (1/14), a 
bias which would not be expected from a chance 
ORF. The first available ATG is in an unfavorable 
context (CxxAUGA) for translation initiation [31]. 

The mRNA 3 potentially allows the synthesis of 
two producis, X2a and X2b, 71 and 165 amino 
acids long, respectively. Both ORFs have ATG 
codon flanking sequences (TxxAUGG, TxxAUGT) 
which function poorly as initiation signals [31]. 
Their codon usage suggests that they are not chance 
ORFs (data not shown). The hydrophilicity profile 
of X2a (7711 kDa) did not reveal any special 
feature. The X2b product (18 833 Da) was shown 
to be hydrophobic overall, with a markedly acidic 
C-terminus comprising a cluster of 4 glutamic acid 


residues (position 1180, Fig. 5). As pointed out, the 
sequence of 2 of the 4 clones spanning this region 
predicted an alternative product X2b’, 67 amino 
acids shorter at the C-terminus than X2b (X2b’: 
11413 Da). This finding might reflect a heteroge- 
neity of the virus population, although a cloning 
artifact cannot be ruled out completely. 

It is presently difficult to reconcile the above in- 
formation with experimental data available for 
TGEV. Jn vitro translation of mRNA 3 produced 
a 24 kDa polypeptide which neither comigrated 
with any intracellular viral protein nor could be im- 
munoprecipitated with anti-virion protein an- 
tibodies [14]. A 16—17 kDa non-structural poly- 
peptide, which was unglycosylated and which in- 
duced a late antibody response in the host, has been 
characterized in TGEV-infected cells [32]. A non- 
structural polypeptide of similar 7, (15 kDa) has 
been observed in our laboratory, but the latter was 
shown to incorporate [?>S]cysteine (B. Delmas & 
H. Laude, unpublished results), whereas no cys 
residue is predicted in X2b. Finally, no smaller 
polypeptide with an M, approaching that of X1 or 
X2a has been identified so far. 

Computer investigations revealed no convincing 
homologies at the DNA or protein level between 
the TGEV X1 or X2a sequences and the ‘non- 
structural’ genes of IBV [29, 33] and MHV [28, 34] 
(data not shown). However, the TGEV X1 product 
(Fig. 7) and the highly hydrophobic 7.5 kDa 
polypeptide predicted by the sequence of IBV 


min ee eee Oe a ete 
mRNA B [33] might have a common (yet unknown) 


Hydrophilic index 


20 40 60 80 
Amino acid position 


Fig. 7. Hydrophilicity plot of the predicted X1 polypeptide. Run- 
nin: average taken over a heptapeptide using the values of Hopp 
& Woods [30]. 
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function. In addition, TGEV X2b shows some 
similarities with IBV 12.4 kDa (mRNA D) and 
MHV 10.2 kDa (mRNA 5) translation products 
[29, 28]. They are all produced from a downstream 
ORF, are hydrophobic overall except for the 
C-terminus and have an unusually high tyrosine 
content (7 — 10%). A low sequence homology be- 
tween these IBV an MHV polypeptides has already 
been pointed out [29]. In conclusion, the marked 
resemblance between the structural polypeptides of 
coronaviruses does not extend to the above- 
mentioned gene products. Some of them may prove 
to be key factors in the virus cycle, for instance in 
transcription — replication switching. One way to 
achieve their characterization would be to use an- 
tisera directed against synthetic peptides derived 
from the sequence so as to facilitate their identifica- 
tion in infected cell extracts. 
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